| Season | Home Runs |
|---|---|
| 2015 | 4909 |
| 2016 | 5610 |
| 2017 | 6105 |
| 2018 | 5585 |
| 2019 | 6776 |
| 2021 | 5944 |
| 2022 | 5215 |
| 2023 | 5868 |
2025-06-01
Media and fans are fascinated with streaky patterns of hitting
Does “Streaky Hitting Ability” exist?
Or maybe we see patterns similar to the streaky patterns in coin flipping with different Head probabilities.
What is the reason for the sudden increase in home run hitting in recent seasons?
Are the hitters changing their approach?
Does the ball construction have an effect?
Other factors?
25 32 41 45 72 76 86 87 100 131 141 150 160 162 176 178 182 187 221 228 269 301 316 339 342 343 368 406 414 420 425 433 454 455 473 522 540 554 578 588 596 598 604 616 637 640 645 652
\[p_1 = ... = p_n = p\]
the \(p_j\) are different and distributed according to a Beta(\(a, b\)) curve, for specified values of \(a\) and \(b\)
Bayes factor in support of streaky \(S\) is \[ BF = \frac{f(y | S)}{f(y | C)} \]
If \(\log BF > 0\), support for true streakiness.
Here we say that player is streaky if \(\log BF > 0.5\)
In a particular season, we’ll find some streaky hitters
Maybe players are truly consistent and we are observing “chance” streakiness due to multiplicity.
Would a consistent model for hitting predict this observed streakiness?
Assume Consistent Model where each player has a single probability of success.
Estimate probabilities \(P_1, ..., P_N\) for \(N\) players using an exchangeable model.
Simulate binary outcomes from Bernoulli distributions using these probability estimates.
Using Bayes factor, find the fraction of streaky hitters.
Set definition of success (HIT or SO)
Simulate 50 replicated datasets from predictive distribution from consistent model fit
Plot for each season, the fraction of streaky hitters
Compare with observed fraction
Observed fraction of streaky hitters is similar to what one would predict from consistent model.
Hard to identify truly streaky hitters using Hit as success.
Find more observed streaky players than one would predict based on simulations from a consistent model.
So patterns of Strikeout streakiness are “interesting”.
Motivates search for hitters who have streaky patterns over their careers.
| Season | Home Runs |
|---|---|
| 2015 | 4909 |
| 2016 | 5610 |
| 2017 | 6105 |
| 2018 | 5585 |
| 2019 | 6776 |
| 2021 | 5944 |
| 2022 | 5215 |
| 2023 | 5868 |
Define the home run rate as the fraction of \(HR\) among all batted balls (\(AB - SO\))\[ HR \, Rate = \frac{HR}{AB - SO} \]
Look at history of \(HR\) rates
Fall of 2017 a committee was charged by Major League Baseball to identify the potential causes of the increase in the rate at which home runs were hit from 2015 to 2017.
Committee released two reports (May 2018 and December 2019)
The batters?
The pitchers?
The ball?
Game conditions?
IN-PLAY: Have to put the ball in play
HIT IT RIGHT: The batted ball needs to have the “right” launch angle and exit velocity
REACH THE SEATS: Given the exit velocity and launch angle, needs to have sufficient distance and height to clear the fence (the carry of ball)
Focus on the in-play home run rates in July 2021 and July 2022
Observe big drop in the HR rate
Is it due to the hitter’s approach?
Or is it due to the ball?
Express the logit of the home run probability as \[ \log \left(\frac{P(HR)}{1 - P(HR)}\right) = s(LA, LS) \]
\(s()\) is a smooth function of the launch angle (LA) and the launch speed (LS)
Generalization of the linear regression model \(y = X \beta + \epsilon\)
Fit a GAM to the in-play HR data for July 2021
Use the model fit to predict the HR rate for July 2022 using the 2022 launch variables
By simulation, get a prediction distribution
Predictions are smaller than the observed 2021 rate. This indicates a change in the hitter launch variables.
But the observed 2022 rate is smaller than the prediction distribution – this indicates that the ball is deader in 2022
Hit 62 home runs during a season when the ball was relatively dead
Raises the question: How many home runs would Judge hit during a different season during Statcast era?
Suppose the different season is 2019.
Fit a “2019 ball model” that predicts the probability of a HR in 2019 given values of the launch angle and exit velocity.
Collect the launch variables for Judge for all balls put into play. For each BIP, predict P(HR) using 2019 ball model.
Sum the probabilities – predict the season HR.
For each Judge’s ball in play in 2022, predict the probability of HR from the launch variables using the 2019 ball model.
Sum the probabilities – predict total HR count
Can get a 90% prediction interval
If Judge was hitting using a 2019 ball, predict he would hit 75 home runs
A 90% prediction interval would be (69, 81)
Use GAM model to predict prob(HR) from the launch angle and exit velocity for one season
Use this ball model to predict HR probability using 2022 launch variables
Sum prediction probabilities
Judge only hit 62 home runs in 2022
But if he was playing during a different season where the ball was more alive (more carry), the prediction of his 2022 count to be in the 70’s
So Judge’s home run achievement is understated
Due to this ball bias, we don’t appreciate magnitude of Judge’s accomplishment
Two important factors in home run hitting are the hitters (values of launch variables) and the ball (carry or drag coefficient).
Batters are stronger and changing their hitting approach, leading to higher rates of “HR friendly” balls in play.
The composition of the ball has gone through dramatic changes during the Statcast era.
Currently the ball is relatively dead compared to previous seasons.
2007 Streaky Hitting in Baseball, Journal of Quantitative Analysis of Sports, Vol 4, Issue 1.
2013 Looking at Spacings to Access Streakiness, Journal of Quantitative Analysis of Sports, Vol 9, Issue 2.
2014 Streakiness in Home Run Hitting. Chance, 27(3), 4-9.
2020 The Home Run Explosion, Science Meets Sport, Cambridge Scholars Publishing.
2024 Balls are Traveling Farther in 2024 in Progressive Field (with Alan Nathan), Baseball Prospectus